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Abstract — The i\ tracker obtains robustness by seeking a 
sparse representation of the tracking object via i\ norm min- 
imization [1]. However, the high computational complexity in- 
volved in the i\ tracker restricts its further applications in 
real time processing scenario. Hence we propose a Real Time 
Compressed Sensing Tracking (RTCST) by exploiting the signal 
recovery power of Compressed Sensing (CS). Dimensionality 
reduction and a customized Orthogonal Matching Pursuit (OMP) 
algorithm are adopted to accelerate the CS tracking. As a result, 
our algorithm achieves a real-time speed that is up to 6,000 
times faster than that of the i x tracker. Meanwhile, RTCST 
still produces competitive (sometimes even superior) tracking 
accuracy comparing to the existing l\ tracker. Furthermore, for 
a stationary camera, a further refined tracker is designed by 
integrating a CS-based background model (CSBM). This CSBM- 
equipped tracker coined as RTCST-B, outperforms most state- 
of-the-arts with respect to both accuracy and robustness. Finally, 
our experimental results on various video sequences, which are 
verified by a new metric — Tracking Success Probability (TSP), 
show the excellence of the proposed algorithms. 

Index Terms — Visual tracking, compressed sensing, particle 
filter, linear programming, hash kernel, orthogonal matching 
pursuit. 

I. Introduction 

Within Bayesian filter framework, the representation of the 
likelihood model is essential. In a tracking algorithm, the 
scheme of object representation determines how the concerned 
target is represented and how the representation is updated. A 
promising representation scheme should accommodate noises, 
occlusions and illumination changes in various scenarios. In 
the literature, a few representation models have been proposed 
to ease these difficulties [2-7]. Most tracking algorithms 
represent the target by a single model, typically built on 
extracted features such as color histogram [8,9], textures [10] 
and correspondence points [11]. Nonetheless, these approaches 
are usually sensitive to variations in target appearance and 
illumination, and a powerful template update method is usu- 
ally needed for robustness. Other tracking algorithms train 
a classifier off-line [5, 12] or on-line [7] based on multiple 
target samples. These algorithms benefit from the robust object 
model, which is learned from labeled data by sophisticated 
learning methods. 
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Recently, Mei and Ling proposed a robust tracking algo- 
rithm using i\ minimization [1]. Their algorithm, referred 
to as the £i tracker, is designed within Particle Filter (PF) 
framework [13]. There a target is expressed as a sparse 
representation of multiple predefined templates. The i\ tracker 
demonstrates promising robustness compared with existing 
trackers [14-16]. However, it has following problems: Firstly, 
t\ minimization in their work is slow; Secondly, they use 
an over-complete dictionary (an identity matrix) to represent 
the background and noise. This dictionary, in fact, can also 
represent any objects (including the user interested tracking 
objects) in video. Hence it may not discriminate the objects 
against background and noise. 

Although the £i tracker [1] is inspired by the face 
recognition work using sparse representation classification 
(SRC)[17], it doesn't make use of the sparse signal recovery 
power of Compressed Sensing (CS) used in [17]. CS is 
an emerging topic originally proposed in signal processing 
community [18, 19]. It states that sparse signals can be exactly 
recovered with fewer measurements than what the Nyquist- 
Shannon criterion requires with overwhelming probability. It 
has been applied to various computer vision tasks [17,20,21]. 

Inspired by the i\ tracker and motivated by their prob- 
lems, we propose two CS-based algorithms termed Real-Time 
Compressed Sensing Tracking (RTCST) and Real-Time Com- 
pressed Sensing Tracking with Background Model (RTCST-B) 
respectively. The new tracking algorithms are tremendously 
faster than the standard £i tracker and serve as better (in terms 
of both accuracy and robustness) alternatives to existing visual 
object trackers such as those in [7, 13, 14]. 

The key contributions of this work can be summarized as 
follows. 

1) We make use of the sparse signal recovery power of 
CS to reduce the computational complexity significantly. 
That is we hash or random project the original features 
to a much lower dimensional space to accelerate the 
CS signal recovery procedure for tracking. Moreover, 
we propose a customized Orthogonal Matching Pursuit 
(OMP) algorithm for real-time tracking. Our algorithms 
are up to about 6, 000 times faster than the standard £\ 
tracker of [1]. In short, we make the tracker real-time 
by using CS. 

2) We propose background template rather than the over- 
complete dictionary in [1]. This further improves the 
robustness of the tracking, because the representation of 
the objects and background are better separated. This 
new tracker, which is referred to as RTCST-B in this 
work, outperforms most state-of-the-art visual trackers 
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with respect to accuracy while achieves even higher 
efficiency compared with RTCST. 
3) Finally, we propose a new metric called Tracking Suc- 
cess Probability (TSP) to evaluate trackers' perfor- 
mance. We argue that this new metric is able to mea- 
sure tracking results quantitatively and demonstrate the 
robustness of a tracker. Consequently, all the empirical 
results are assessed by using TSP in this work. 

For ease of exposition, symbols and their denotations used 
in this paper are summarized in Table I. 

The rest of the paper is organized as follows. We briefly 
review the related literature background in the next section. 
In Section III, the proposed RTCST algorithm is presented. 
We present the RTCST-B tracker in Section IV. We verify our 
methods by comparing them against existing visual tracking 
methods in Section V. Conclusion and discussion can be found 
in the last section. 

II. Related work 

In this section, we briefly review theories and algorithms 
closest to our work. 

A. Bayesian Tracking and Particle Filters 

From a Bayesian perspective, the tracking problem is to 
calculate the posterior probability p{sk\yk) of state s k at time 
k, where is the observed measurement at time k [13]. In 
principle, the posterior PDF is obtained recursively via two 
stages: prediction and update. The prediction stage involves 
the calculation of prior PDF: 

p(sfc|yfc-i) = J p(sfc|s fc _i)p(sfc_i|y fc _i)ds fc _i. (1) 
In the update stage, the prior is updated using B ayes' rule 

p(yk\sk)p(sk\yk-i) 



Therefore, (5) is simplified into 



p(sfc|yfc) 



p(yk\yk-i) 



(2) 



The recurrence relations (1) and (2) form the basis for the 
optimal Bayesian solution. Nonetheless, the solution of above 
problem can not be analytically solved without further sim- 
plification or approximation. Particle Filter (PF) is a Bayesian 
sequential importance sampling technique for estimating the 
posterior distribution p(sk\yk)- By introducing the so-called 
importance sampling distribution [8]: 

Si ~ q(s), i = l,...,N 3 , (3) 

the posterior density is estimated by a weighted approximation, 



N s 



p(s k \yk) ~J2 w k s ( Sk ~ s k)- 



(4) 



Here 



For the sake of convenience, q(-) is commonly formed as 

(K s fcl4-i> y*0 =p( s kK-i)- 



p(y/cl4M4l4-i) 



(5) 



(6) 



*4 ^ w i-iP(yk\4) 



(7) 



The posterior then could be updated only depending on its 
previous value and observation likelihood p(s/ e |s^_ 1 ). Plus, 
in order to reducing the effect of particle degeneracy [8], a 
resampling scheme is usually implemented as 



Pr(4* = 4) 



1,2, 



(8) 



where the set {s^*}^ is the particles after re-sampling. 

Like the i x tracker, both RTCST and RTCST-B trackers 
use PF framework. However, they differ in how to seek a 
sparse representation which consequently lead to different 
observation likelihood p(s/ c |s^_ 1 ) estimtation. 

B. i\-norm Minimization-based Tracking 

The underlying conception behind SRC is that in many 
circumstances, an observation belonging to a certain class 
lies in the subspace that is spanned by the samples belong 
to this class, and the linear representation is assumed to be 
sparse. Hence, reconstructing the sparse coefficients associated 
with the representation is crucial to identify the observation. 
The coefficients recovery could be accomplished by solving a 
relaxed version of (13) 



min ||x||i, s.t. ||Ax-y|| 2 < e, 



(9) 



where x G W 1 is the coefficient vector of interest; A = [ai, 
a 2 , . . . , a n ] G R dxn is sometimes dubbed as dictionary and 
composed of pre-obtained pattern samples ai G R d Vz; and 
y G R d is the query/test observation, e is error tolerance. 
Then, the class identity Z(y) is retrieved as 



Z(y) = argmin r^y), 
je{i,-,C} 



(10) 



where rj(y) = \\y — A5j(x)\\2 is the reconstruction residual 
associated with class i, C is the number of classes and the 
function 5j (x) sets all the coefficients of x to except those 
corresponding to jth class [17]. 

Given a target template set T = [t u • • • ,t Nt ] G R d ° xNt 
and a noise template set E = [/, -/] G R d ox2d 0? t h e 
£i tracker adopts a positive-restricted version of (14) for 
recovering the sparse coefficients x, i.e., 

min ||x||i, s.t. Px-y|| 2 < e, x h 0. (11) 
Here A = [T, E] G M^ox(7V t +2d ) [ s t h e combination of 
target templates and noise templates while x = [xj", x^] T G 
IR Art+2d ° denotes the associated target coefficients and noise 
coefficients. Note that N t denotes the number of target tem- 
plates and do is the original dimensionality of feature space 
which equals to the pixel number of the initial target. The i\ 
tracker tracks the target by integrating (11) and a template- 
update strategy into the PF framework. Algorithm 1 illustrates 
the tracking procedure. In addition, there is a heuristic ap- 
proach for updating the target templates and their weights in 
the i\ tracker. Refer to [1] for more details. 
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TABLE I: Notation 



Notation 



Description 



4 

A 

y 



T, E, B 
x t , x e , x b 

do , d 



A dynamic state vector at time k 

A dynamic state vector at time k corresponding to the 

ith particle 

The measurement matrix or the collection of templates 
The observed target, a.k.a, observation 

The signal to be recovered in compressed sensing. For CS-based pattern recognition or tracking, it is the coefficient vector for 
the sparse representation 

The projection matrix, could be either a random matrix or a hash matrix in this work 
The collection of target, noise and background templates 

The coefficient vector associated with target, noise and background templates respectively 
The number of target templates and background templates 
The dimensionality of original and reduced feature space 



Algorithm 1: i\ Tracking 
Input: 

. Current frame F k eR hxw . 

• Particles s^_ l5 i = 1, 2, • • • , N s . 

• Templates set A = [T, E] G M. d ° x ^ Nt+2d °\ 

• Templates' weight vector a associated with T. 
begin 

Generate new particles s l k , i = 1, 2, • • • , N s within 
the PF framework; 
for i <r- 1 to N s do 

Obtain observation corresponding to s^; 

Obtain x via solving (11) with IP-based methods; 

Calculate residual: = ||y^ — T • x t || 2 ; 

end 

i* i — argmin(ri); 

l<i<N s 

Get the observed target y k < — y^* and its state 

s/c < — s£ ; 

Update templates T and weights a based on x^* as 
in [1]; 

end 

Output: 

• Tracked target y^. 

• Updated target dynamic state s k . 

• Updated target templates T and their weights a. 



C. Compressed sensing and its application in pattern recog- 
nition 

CS states that a 77- sparse 1 signal xGl n can be exactly re- 
covered with overwhelming probability via few measurements 

Ui = $iX, i = 1, . . . ,ra < n. 

Intuitively, one would achieve x via 

min ||x|| , s.t. $x = y, (12) 

where G R mxn is the measurement matrix, of which rows 
are the measurement vectors &i and y = (7/1, . . . , ym) T • ||x||o 
is the number of non-zero elements of x. Since (12) is NP-hard 
[22], it is commonly relaxed to 

min ||x||i, s.t. $x = y, (13) 
which can be casted into a linear programming problem. 

*a signal x is said 77-sparse if there are at most 77 nonzero entries in x. 



As regards CS-based pattern recognition, to deal with noise, 
one could alternatively solve a Second Order Cone Program: 

min ||x||i, s.t. ||$x-y|| 2 <e, (14) 
where £ is a pre specified tolerance. 

III. Real-time compressed sensing tracking 

In this section, we present the proposed real-time CS 
tracking. 

A. Dimension reduction 

The biggest problem of t\ tracking is the extremely high 
dimensionality of the feature space, which leads to heavy 
computation. More precisely, suppose that the cropped image 
of observation is / G R /lxw , the dimensionality do = h • w is 
typically in the order of 10 3 ~ 10 5 , which prevents tracking 
from real-time. 

Fortunately, in the context of compressed sensing (ignoring 
the non-negativity constraint on x for now), it is well known 
that if the measurement matrix ^> has Restricted Isometry 
Property (RIP) [19], then a sparse signal x can be recovered 
from 

min ||x||i, s.t. ||$Ax-$y|| 2 < e. (15) 

A typical choice of such measurement matrix is random 
gaussian matrix 

R g R dxn , Rij ~7V(0, 1). 

Besides random projection, there are other means that 
guarantee RIP. Shi et al. [23] proposed a hash kernel to deal 
with the issue of computational efficiency. Let h s (j, d) denotes 
a hash function {i.e., the hash kernel) h s : N — » {1, . . . ,d} 
drawn from a distribution of pairwise independent hash func- 
tions, where s G {1, . . . , S} is the seed. Different seed gives 
different hash function. Given h s (j,d), the hash matrix H is 
defined as 

^ f 2ft a (j,2)-3, h a (j,d) = i,Vse{l,...,S} 
13 ' ' \ 0, otherwise. 

(16) 

Obviously, Hij G {0, ±1}. The hash kernel generates hash 
matrices more efficiently than conventional random matrices 
while maintains the similar random characteristics, which 
implies good RIP. 
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In this work, the dimensionality of feature space is reduced 
by matrix G R dX(io (which could be either random matrix 
R or hash matrix H) from do to d where d <C do- This sig- 
nificantly speeds up solving equation (14), for its complexity 
depends on d polynomially. 

B. Customized orthogonal matching pursuit for real-time 
tracking 

1) Orthogonal matching pursuit: Before the compressed 
sensing theory was proposed, numerous approaches had been 
applied for sparse approximation in the literature of signal 
processing and statistics [24-26]. Orthogonal Matching Pursuit 
(OMP) is one of the approaches and solves (12) in a greedy 
fashion. Tropp and Gilbert [22] proved OMP's recoverability 
and showed its higher efficiency compared with linear pro- 
gramming which is adopted by the original i\ tracker of [1]. 
Be more explicit, given that A G R dxn the computational 
complexity of linear programming is around 0(d 2 nz), while 
OMP can achieve as low as O(dn) 2 . We implement the sparse 
recovery procedure of the proposed tracker with OMP so as 
to accelerate the tracking process. 

The number of measurements required by OMP is 
0(rjlog(n)) for 77-sparse signals, which is slightly harder to 
achieve compared with that in l\ minimization. However, it is 
merely a theoretical bound for signal recovering, no significant 
impact of OMP upon the tacking accuracy is observed in our 
experiments (see Section V). 

2) Further acceleration — OMP with early stop: The OMP 
algorithm was proposed for recovering sparse signal exactly 
(see Equation (12)), and the perfect recovery is also guaran- 
teed within d steps [25]. However, in the realm of pattern 
recognition, we argue that there is no requirement for perfect 
recovery for many applications. For example, for classification 
problems, test accuracy is of interest and exact recovery 
does not necessarily translate into high classification accuracy. 
So on the contrary, an appropriate recovery error may even 
improve the accuracy of recognition [17]. We introduce a 
residual based stopping criterion into OMP by modifying (12) 
as 

mm ||x|| , s.t. \\Ax-y\\ 2 <s. (17) 

Moreover, the procedure of OMP could be accelerated remark- 
ably if the above stopping criterion is enforced. To understand 
this, let us assume that OMP follows the MP algorithm [24] 
with respect to the convergence rate 3 , i.e., 

K 

r k = -^,t<n, (18) 

where K is a positive constant and r k = \\Ax.k ~ y|b is the 
recovery residual after t steps. Given that we relax the stopping 
criterion e by 10 times 

e' = 10e, (19) 

2 Here, however, we do not employ the trick that Tropp and Gilbert 
mentioned for the least-squares routine. As a result, the OMP's complexity is 
higher than O(dn) but still much lower than that of linear programming. 

3 Although the convergence rate for MP algorithm is Ofl/y/i), the conver- 
gence rate for OMP remains unclear. 



then the required step t stop is reduced to be 

4 op = K 2 /e' 2 

= 10- 2 K 2 /e 2 (20) 

= 10 ^stop- 
Considering that the complexity of OMP is at least pro- 
portional to t, the algorithm could be accelerated by 100 
times theoretically. Figure 1 shows the empirical influence 
of the terminating criterion upon the running iterations and 
running time. In our algorithm, we empirically set the stopping 
threshold e — 0.01, which draws a balance between speed and 
accuracy. 



Running Time and Iteraion Number of OMP 




Recovery Error e 



Fig. 1: The decreasing tendency of running time and iteration numbers of the OMP 
procedure with different residual thresholds. The result is produced from a Matlab-based 
experiment on video "Cubicle", with the feature dimension of 50. Both the running time 
and iteration numbers are the average result over all the frames and particles. 

3) Tracking with a large number of templates: One notice- 
able advantage of the SRC-based tracker is the exploitation of 
multiple templates obtained from different frames. However, 
for the t\ tracker, the number of templates n should be curbed 
into strictly because it equals to the dimensionality of the 
optimization variable x. To design a good l\ tracker, a trade- 
off between n and the optimization speed is always required. 
Fortunately, this dilemma dose not exist when the tracker is 
facilitated with OMP and a carefully- selected sparsity r\. 

The computational burden of OMP consists of two steps: 
one is for selecting the maximum correlated vector from 
matrix A G IR dxn , and the other is for solving the least squares 
fitting. In step t (t < d), it is trivial to compute the complexity 
of the first step is O(dn) and that for least-square fitting is 
0(d 3 + td 2 + td). Accordingly, the running time of OMP 
is dominated by solving the least-squares problem, which is 
independent of the number of templates, n. In other words, 
within a certain number of iterations, the amount of templates 
would not affect the overall running time significantly. This is 
an important and desirable property in the sense that we might 
be able to employ a large amount of templates. 

Admittedly, larger n might lead to more iterations. However, 
if we impose a maximum sparsity rj, the OMP procedure 
would only last for 77 steps in the worst scenario. From this 
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perspective, a preset 77 <C n is capable to eliminate the 
influence of a large n upon the running iterations. Figure 2 
depicts the change tendency of running time with increasing 
n, given that d G {50, 75}, 77 = 15. As can be seen, the elapsed 
time is only doubled when n is raised by 10 2 times. 



Running Time of OMP Algorithm with Numerous Templates 









Dimension-50 
Dimension-75 







0^ 
10 



50 100 200 

Number of Target Templates 



Fig. 2: Running time of OMP with various numbers of target templates. The experiment 
is carried out on video sequence "Cubicle" with reduced dimensions 50 and 75. The 
recorded running time is the average time consumption for one OMP procedure which 
calculates the observation likelihood for a particle. Note that the cc-axis only indicates 
target templates' number, and the number of trivial templates is not counted. The sparsity 
77 = 15. 

Inspired by this valuable finding, we aggressively set the 
number of target templates to 100 which is 10 times larger 
than that in X. Mei's paper. We try to harness the enormous 
target templates to accommodates the variation of illumination, 
gesture and occlusion and consequently improve the tracking 
accuracy. As regards the sparsity, we elaborately set 77 = 0.5- d 
for RTCST and 77 = 15 for RTCST-B which is introduced in 
Section IV. We believe the numbers are sufficiently large for 
the representations. 

Hereby, we sum up all the adjustments to OMP mentioned 
in Algorithm 2. Note that here we use the inner product rather 
than its absolute value to verify the correlation. This heuristic 
manner is used to make the recovered coefficient vector x >z 
0, approximately. For RTCST-B introduced in next section, 
the absolute value of inner product is re-employed due to the 
absence of the positive constraint. 

C. Minor modifications 

Besides the dimension reduction methods and OMP, modi- 
fications to the original £1 tracker are proposed in this section 
to achieve a even higher tracking accuracy. 

1) Update templates according to sparsity concentration 
index: In the l\ tracker, the template set is updated when 
a certain threshold of similarity is reached, i.e., 



sim(y,a i ) < r, 



(21) 



where i = argmax(x^) and sim(y,a) is the function for 
evaluating the similarity between vectors y and a. It can be the 
angle between two vectors or SSD between them. However, 
Wright et al. proposed a better approach to validate the repre- 
sentation. The approach, which utilizes the recovered x itself 



Algorithm 2: Customized OMP for Tracking 
Input: 

• A normalized observation y G R d . 

• A mapped templates set §A = [ai, • • • , a n ] G M dxn 

• A recovery residual < e <C 1. 

• A sparsity < 77 <C n. 

begin 

Initialize the residual 1*0 = y, index set Ao = and 
selected template set ^0 = 0; 
for t «— 1 to 77 do 

A t = argmax(n_i,a j ); 

A t = A t _!i'u{A t }; 
*t = [*t-i a A J; 
Solve the least-squares problem: 
x £ = argmin||^tx-y|| 2 ; 

X 

Calculate the new residual: 

r* = y - ^*x £ ; 

if 1 1 1 1 2 < e then break; 

end 

Retrieve signal x according to x t and A t \ 
end 

Output: 

• Recovered coefficients x G W 1 



rather than the similarity, is termed Sparsity Concentration 
Index (SCI) [17]. Particularly, in the context of RTCST, class 
number is 1 if the noise is not viewed as a class, then we obtain 
a simplified SCI measurement for the target class, which writes 



SCI t (x) = ||x t || 1 /||x|| 1 e[0,l], 



(22) 



where x t = x(l : N t ). In the presented RTCST algorithm, 
SCIt is employed instead of (21). 

2) Abandoning the template weight: The original t\ tracker 
enforces a template re-weighting scheme to distinguish tem- 
plates by [1], their importance. Nonetheless, following their 
scheme the weight of each target template is always smaller 
than that of noise templates (see Algorithm 1). This does 
not make much sense. Actually, it may be intractable to 
design an ideal template re-weighting scheme that works in 
all the circumstances. A poorly-designed re- weighting scheme 
could even deteriorate the tracking performance. We abandon 
the template weight because the importance of templates 
be easily exploited by the compressed sensing procedure. 
Without template weights, the tracker becomes simpler and 
less heuristic. The empirical result also shows better tracking 
accuracy when template weight is abandoned. 

3) MAP and MSE: In Mei and Ling's framework [1], 
the new state is corresponding to the particle with the 
largest observation likelihood. This method is known as the 
Maximum A Posterior (MAP) estimation. It is also known that 
for the particle filtering framework, Mean Square Error (MSE) 
estimation is usually more stable than MAP. As a result, we 
adopt MSE in our real-time tracker, namely, 



Er=i(4-ti) 



(23) 
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Algorithm 3: Real-Time Compressed Sensing Tracking 
Input: 

. Current frame Fj, eR hxw . 

• Particles s^_ l5 i = 1, 2, • • • , N s 

• A dimension-reduction matrix £ R dxd °. 

. A Templates set A = [T, E] G ^0x^+2^ 

• A preset parameter A > 0. 
begin 

Normalize every column of <&A\ 

Generate new particles s^, i = 1, 2, • • • , 7V S ; 

for i <— 1 to 7V S do 

Obtain mapped observation ^y^ corresponding to 

4; 

Get x via solving (24) with Algorithm 2; 

Calculate residual via (25); 

Calculate observation likelihood k = exp(— A • ri) 

end 

Calculate target dynamic state Sk via (23) and then 
get the target y fc ; 

Recalculate for y^ via solving (24); 
Update templates T based on x/e and (22); 

end 

Output: 

• Tracked target y^. 

• Updated target dynamic state s^. 

• Updated target templates T. 



where is the zth particle at time k and U is the corresponding 
observation likelihood. 

D. The Algorithm 

In a nutshell, for each observation, we utilize Algorithm 2 
to recover the coefficient vector x by solving the problem 

min||x||o, s.t. ||$Ax-$y|| 2 < e, x^O (24) 

X 

where x = [x £ , x e ], A = [T, £"]. The residual is then obtained 
by 

r= ||$y-$^x t || 2 . (25) 
Finally the likelihood of this observation is updated as 

I = exp(-A-r), A > 0. (26) 

The procedure of Real-Time Compressed Sensing Tracking 
algorithm is summarized in Algorithm 3. Our template update 
scheme is demonstrated in Algorithm 4. As can be seen, the 
proposed update scheme is much conciser than that in the t\ 
tracker [1] thanks to the abandonment of template weight. The 
empirical performance of RTCST is verified in Section V. 

IV. RTCST-B: More Robust and Efficient RTCST 

WITH BACKGROUND MODEL 

To some extent, visual tracking is viewed as object detec- 
tion task with prior information. Similar to object detection, 
which is sometimes treated as a classification problem, vi- 
sual tracking also distinguishes the foreground (target) from 
background. In detection applications, the background class 



Algorithm 4: Template Update Scheme for RTCST 
Input: 

• Sparse coefficient x = x/e in Alg. 3. 

• Observed target y^. 

. Target templates set A = [a x , a 2 , • • • , a Nt ] e R d ° xNt . 

• A preset parameter < r < 1. 
begin 

if SC7 t (x) < r then 
j* < — argmin(xj); 

l<j<N t 

a?* i — yfc, where a^* is the j*th target template; 

end 

end 

Output: 

• Updated target templates A. 



is usually considered without distinct feature because it could 
follow any pattern. Quite the contrary, in the context of visual 
tracking, the background is much more limited with respect to 
appearance variation. Particularly, for the stationary camera, 
the background is nearly fixed. Under these assumptions, 
it is worthwhile exploiting the background information for 
tracking. And appropriate incorporations of background model 
indeed improve the tracking performance [7, 27-29]. 

We hereby propose a novel CS -based background model 
(CSBM) to facilitate tracking algorithm. The definition of CS- 
based background model is quite simple. Suppose that I\ e 
R /lxw , i = 1, • • • , A/5 is the zth frame where foreground is 
absent, and h and w are the height and width of the frame 
respectively, we define the background model as 

G = {r l5 ...,rx} (27) 

or in short, the collection of backgrounds. The background 
templates are then generated from CSBM to cooperate with 
target templates in our new tracker. 

Please note that our algorithms is unrelated to the back- 
ground subtraction manner proposed by Volkan et al. [20]. 
In their paper, foreground silhouettes are recovered via CS 
procedure but the background subtraction is still performed 
in conventional way. Our CSBM and RTCST-B is entirely 
different from their manner, both in essence and appearance. 
The details of CSBM and its incorporation with RTCST are 
introduced below. 

A. Building the Optimal CSBM 

A good CSBM should only constitute "pure" backgrounds 
and contain sufficiently large appearance variation, e.g., illumi- 
nation changes. Ideally, we could simply select certain number 
of foreground-absent frames from video sequence to build a 
CSBM. However, the "pure" background is usually difficult to 
find and it is even harder to obtain the ones cover the main 
distribution of background appearance. 

An intuitive way to obtain a clean background is replacing 
the foreground of one frame with a background patch cropped 
from another frame. More precisely, let F G R hxw denote 
the frame based on which the background is retrieved, and 
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F' G R /lxw stand for the frame where the background patch 
is cropped, suppose that the foreground region in F is F(t : 
6, Z : r) 4 , the patching operation could be described as 




Fig. 3: An illustration for retrieving background, (a) shows contaminated background 
with foreground regions signed by red rectangles; (b) is the frame where the background 
patches are obtained, note that blue rectangles indicate the foregrounds in (b), they are 
far from the foreground areas in (a); (c) demonstrates a retrieved background based on 
(a) and (b). The frames are captured from video sequence pets2000_cl. 

In practice, multiple foreground regions need to be mended 
for each "impure" background candidate. Furthermore, a selec- 
tion approach should be conducted to form the optimal com- 
bination of the retrieved backgrounds over all the candidates. 
To achieve this goal, we first randomly capture N' > 
frames from the concerned video sequence. Afterwards, every 
foreground region of the frames are located manually. The 
foreground is then replaced by a clean background region 
cropped from the nearest frame (in terms of frame index). 
Finally, a /c-median clustering algorithm is carried out for 
selecting most comprehensive backgrounds. 

It is nontrivial to notice that even some backgrounds are 
not perfectly retrieved, i.e., with minor foreground remains, 
CSBM can still work well considering that CS is robust to the 
noise in measurements [19]. 

B. Equiping RTCST with CSBM 

We equip the RTCST with CSBM to build a novel visual 
tracker, a.k.a Real-Time CS-based Tracker with Background 
Model (RTCST-B). In RTCST-B, original noise templates are 
replaced by background templates which are generated from 
CSBM. In the context of PF tracking, given a observation 
position S with do pixels and a CSBM G defined in (27), the 
background templates set B is obtained by: 

B = [h h ... I Nb ]eR d ° xN » 
Ii = CV(Ti, E) Vz = l,...,JV 6 

where function CV(-) is called crop-vectorize operation which 
first crops the region indicated by 5 from background T^ and 
then vectorize it into U G M d °. Eventually, the optimization 
problem for RTCST-B writes: 

min||x|| s.t. ||$i4x-$y|| 2 < e, (30) 



where x is comprised of and x&, i.e., the coefficient vectors 
for target and background, A = [T, B] G R dx( < Nt+Nb \ 

Despite the diverse optimization problem, the calculation 
for the likelihood remains the same as in (26). To understand 
this, let x t and x^ denote the coefficients associated with target 
templates and background templates respectively, p(y&|s) = 
p(yfc|xt) = exp(— Ar) be the observation likelihood 5 , where 
r is defined in (25), then we have: 

p(y fe |x t ,x 6 ) =p(y fe |x t ) =exp(-Ar) (31) 

with the assumption that x t and x^ are deterministic by each 
other, i.e., 

p(x b , x*) = p(x t ) = p(x fe ) (32) 

or in other words, the solution of CS procedure is unique. [19]. 

In addition, the template update scheme should be changed 
slightly considering a new class is involved in. More precisely, 
target templates are updated only when 

*n M max{||x t ||i,||x b ||i} 

SCl t6 (xj = — < r (33) 

ll x lli 

Finally, the positive constraint for x is removed in 30 
because background subtraction implies minus coefficients 
for background templates. It is reasonable to not curb the 
coefficients in RTCST-B. 

In summary, one just needs to impose following minor 
modifications on RTCST to transfer it into RTCST-B. 

1) Substitute the background templates for noise templates. 

2) Eliminate the positive constraint. 

3) Conduct the CV operation for each observation. 

4) Utilize the new SCI measurement. 

Apparently, the diversity between RTCST and RTCST-B is 
not significant with respect to formulation. Nevertheless, the 
seemingly small change makes RTCST-B much more superior 
to its prototypes. 

C Superiority Analysis 

Compared with the t\ tracker and RTCST, RTCST-B enjoys 
three main advantages which are described as follows. 

1) More Sparse: An underlying assumption behind the t\ 
tracker and RTCST is that, the background could be sparsely 
represented by noise templates in E. It is true when foreground 
dominates the observed rectangle. More quantitatively, given 
rjt is the sparsity of target coefficient vector x t , when 

rjt + ||x e || <d/3 

the representation based on solution x in (24) is guaranteed 
to be reliable [17]. Nonetheless, the sparse representation is 
no longer valid when the background covers the main part 
of observation. Predictably, the incorrect representation will 
deteriorate tracking accuracy. 

On the other hand, after noise templates being replaced by 
background templates, the aforementioned assumption usually 
keeps true. Figure 4 give us a explicit demonstration for the 
sparsity of solutions. 



4 In this paper, all the target or foreground is represented as a rectangle 
region 



5 It is trivial to prove that the relationship between particle s and xt is 
deterministic given a specific frame image 
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Fig. 4: A demonstration of the sparse solutions for RTCST and RTCST-B. (a) and (b) 
are the tracking result by RTCST and RTCST-B on the same frame (captured from 
pets2004 _pl). (c) and (d) are the recovered signals for RTCST and RTCST-B respectively. 
The representation by RTCST-B is much more sparse than that by RTCST. Note that 
here, d = 50, N t = 100 and N b = 10 for RTCST-B. 



2) More Efficiency: Comparing with existing background 
models, the computation burden of CSBM is extremely trivial. 
First of all, there is no need to conduct the background 
subtraction or foreground connection in RTCST-B, because 
these two functions are integrated within the CS procedure 
implicitly. Secondly, if the CSBM is generated properly, i.e., 
can cover the main distribution of background's appearance, 
to update model becomes unnecessary. Thirdly, the sufficient 
number of background templates is much smaller than that of 
noise template, i.e., 

N b <^N n = 2d 

where N n is the number of noise templates. The reduction of 
templates' amount will immediately speed up the optimization 
process. The last, and the most important reason is, the 
required sparsity rj for RTCST-B is much smaller than that for 
RTCST (see Section III-B3). This leads to an earlier terminated 
OMP procedure in RTCST-B and hence makes it faster. In 
conclusion, the introduction of CSBM won't impose further 
computational burden on the algorithm, and just the opposite, 
the tracking procedure will be accelerated to some extent. 

3) More Robust: In RTCST and i\ tracker, one tries to 
use noise templates E = [I — I] to represent background. 
However, it is the columns in I, which is called standard 
basis vectors, doesn't favor background images over targets. 
This character makes RTCST and i\ tracker powerless for rec- 
ognizing background and consequently, decreases the tracking 
accuracy. Differing from the prototype, RTCST-B harnesses 
the discriminant nature of CS -based pattern recognition. Both 
foreground (target) and background are treated as a typical 
class with distinct features. In RTCST-B, target templates 
compete against background templates, who are as powerful 
as their competitors, to "attract" the observation. Intuitively, 



the more discriminative templates will make RTCST-B more 
robust. 

Moreover, once the tracked region drifts away, background 
information would be brought into target templates via tem- 
plate update (which is almost unavoidable). In this situation, 
for RTCST and t\ tracker, some target templates could be 
more similar to background than all the noise templates. This 
leads to a serious classification ambiguity and therefore, poor 
tracking performance. Quite the contrary, RTCST-B could 
draw back the target to the correct position thanks to the 
capacity of recognizing background. In plain words, RTCST-B 
always tends to locate the target in the region which doesn't 
look like background. An empirical evidence for the robustness 
of RTCST-B is shown in Figure 5. 




(e) 



(f) 



(g) 



(h) 



Fig. 5: An empirical evidence for the robustness of RTCST-B against drift, (a) to (d) 
are the tracking results for RTCST-B compared with image (e) to (h) which are the 
results for RTCST on the same frames from pets2000. The tracked target is signed by 
red rectangle. We can see that a drift tendency shown on (b) is curbed in the successive 
frames. Quite the contrary, in the bottom line, the drift effect grows dramatically. 



V. Experiment 

A. Experiment Setting 

To verify the proposed tracking algorithms, we design a 
series of experiments for examining the tracking algorithm in 
terms of accuracy, efficiency and robustness. The proffered 
algorithms are conducted on 10 video sequences comparing 
with £i tracker, Kernel-Mean- Shift (KMS) tracker [14] and 
color-based PF tracker[13]. The details of selected video 
sequences are list in Table II. Note that we only conduct l\ 
tracker on 5 videos which are cubicle, dp, carll, pets2001_cl 
and pets2004-2 _pl respectively. It is because for other videos, 
the convex optimization problem is too slow to be solved 
(above 5 minutes per frame). 

There are two alternative dimension-reduction manners for 
RTCST and RTCST-B, namely, random projection and hash 
matrix projection. In our experiments, both of them are per- 
formed with reduced dimension 25, 50 and 100. As regards 
the particles' number, we examine the proposed trackers with 
100 and 200 particles and the numbers for PF tracker is 100, 
200 and 500. All the PF-based trackers are run for 20 times 
except t\ tracker which is merely conducted for 3 times. 
We perform KMS tracker for only 1 time considering it is a 
deterministic method. The average values and standard errors 
are reported in this section. The MS tracker, PF tracker and i\ 
tracker are implemented in C++ while our CS -based trackers 
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TABLE II: The details of video sequences which are employed for our experiment. 
The tracking frames refer to the concered frame index for each video; initial position 
indicates the minimum bounding box for the target in the first frame; if "Yes" shows 
in the last column, the video is captured from a stationary camera and consequently, it 
suits RTCST-B. 





tracking frames 


initial position 


stationary camera 


cubicle 


1 ~ 51 


[56, 24, 90, 67] 


No 


dp 


1 ~ 66 


[91, 25, 116, 57] 


No 


car4 


1 - 300 


[139, 102, 356, 283] 


No 


carll 


1 ~ 393 


[69, 123, 104, 157] 


No 


fish 


1 ~ 200 


[122, 57, 208, 148] 


No 


pets2000_cl 


122 ~ 312 


[536, 318, 743, 432] 


Yes 


pets2001_cl 


1550 rsj 1635 


[8, 272, 46, 296] 


Yes 


pets2002_pl 


275 ~ 500 


[578, 92, 641, 172] 


Yes 


pets2004_pl 


115 ~ 550 


[193, 258, 251, 287] 


Yes 


pets2004-2_pl 


1 ~ 201 


[181, 224, 239, 262] 


Yes 



are implemented in Matlab. To compare the efficiency with 
the proposed algorithms, there is also a Matlab version of 
i\ tracker. All the algorithms are run on a PC with 2. 6 GHz 
quad-core CPU and AG memory (we only use one core of 
it). As to the software, we use Matlab 2009a and the linear 
programming solver is called from Mosek 6.0[30]. 

It is important to emphasize that in our experiment, no 
trick is used for selecting the target region in the first frame. 
The initial target region is always the minimum rectangle 
R = [Z,r, t, b] which can cover the whole target 6 , where I, 
r, t, and b are the left, right, top and bottom boundaries' 
coordinates (horizontal or vertical) respectively. This rigid 
rule is followed for eliminating the artificial factors in visual 
tracking and making the comparison unprejudiced. 

B. TSP — A New Metric of Tracking Robustness 

A conventional choice of the manner to verify the tracking 
accuracy is tracking error. Specifically, given that the centroid 
of ground truth region is c g while that of tracked region is c t , 
the tracking error p is defined as 

p=\\c g -c t \\ 2 , (34) 

i.e., the euclidean distance between two centroids. However, if 
we take scale variation into consideration, p is poor to verify 
tracker's performance. Let's see Figure 6(a) for a example. 
In the image, red rectangle indicates the ground truth for a 
moving car. The blue and gray rectangles, which are obtained 
by various tracking algorithms, share the identical centroid. 
By using tracking error, same performance is reported for 
both two trackers despite the obvious difference on tracking 
accuracy. 

Inspired by the evaluation manner proposed for PASCAL 
data base[31], we propose a new tracking accuracy measure- 
ment which is termed Tracking Success Probability (TSP). To 
obtain the definition of TSP, firstly let's suppose the bounding 
box of ground truth region is R 9 = [l g ,r g ,tg,bg], and the 
one for tracked region is R t = [lt,rt : t t: b t ]. We then design a 
function a(R gj R t ) G [—1,1] to estimate the overlapping state 
between R g and R t . Given two distance sets: 



H = {r £ 
V = {b t 



tg, b g 



l t,r g 

t t ,bg 



l 9 ,n - k} 

tgM ~ t t } 



and R t are seperate 
otherwise. 



mm(l) • ram(V) 



(35) 



and a indicator function s tg 

, ._ J "I. Rg 
1, 

then a(R g ,R t ) writes 7 

a(R g ,R t ) = s tg • , , 
y y max(W) • maxiy) 

It is easy to find that when two regions overlap each other, 
a(R g ,R t ) is the ratio of the intersection area R g nd to the area 
R*, which is the minimum region covering both R g and Rd. 
See Figure 6(b) for an instance. Finally, TSP is formulated as 

exp(z/ • a(R g ,Rt)) 



TS?(R g ,Rt) 



e[0,l], (36) 



1 + exp(^ • a(R g ,Rt)) 

where v > is a preset parameter reflects the worst scenario 
we could assure the target is located correctly. In our experi- 
ment, v is the solution of 

exp(0.25^) 



0.95 



11.8. 



(37) 



1 + exp(0.25z/) 

In other words, when the overlapped region is larger than 25% 
part of region R*, we are convinced (with the probability of 
0.95) that the tracking is successful. 

Obviously, the larger the TSP is, the more confident we 
believe this tracking is successful. If we apply TSP to the 
tracking results shown in Figure 6(a), then the TSP of blue 
rectangle is 0.95 which is significantly larger than that of 
the gray one (with TSP of 0.55). The difference implies 
that TSP is capable to accommodate dynamic factors besides 
displacement. Another merit of TSP is the comparability over 
different video sequences thanks to its fixed value range i.e., 
[0,1]. Considering these advantages, in the current paper, all 
the empirical results are evaluated by TSP. As a reference, 
tracking error results are also available. 




(a) tracking error 



(b) TSP 



Fig. 6: A demonstration of two measurements of tracking accuracy, (a) shows the poor 
capacity of p. (b) illustrates the definition of TSP. R g and Rt are illustrated as red and 
blue rectangles respectively; the region R* is a gray dashed square in the image while 
intersection region R g nt is shown in purple. We can see that in this case, a(R g , Rd) — 
R gnt /R* ■ These two frames are obtained from video sequence pets2001 and pets2002 
respectively. 



C. Tracking Accuracy 

Firstly, we examine the tracking accuracy of our trackers 
comparing with the competitors. The average TSP for every 
experiment is shown in Table III. For each video sequence, 
the optimal accuracy is displayed in bold type. 



6 Shadows are not taken into consideration. 



7 Here, we suppose the origin of image is on the left-top corner. 
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TABLE III: TSP values for tracking experiments. The term "R-Dcc-Rand" stands for RTCST with cc-dimension features which is generated by random projection while the row 
started with "RB-. . ." refers to the results with RTCSTB. "PNcc" indicates x particles are used in the tracker. The optimal values for each video sequence is illustrated in bold type. 





cubicle 


dp 


car4 


carll 


fish 


pets2000 cl 


pets2001 cl 


pets2002 pi 


pets2004 pi 


pets2004-2 pi 


KMS 


77 ±32.7 


100 ±0.4 


24 ± 29.5 


67 ±40.5 


98 ± 1.4 


94 ± 4.8 


23 ± 14.6 


23 ± 37.0 


52 ±30.2 


26 ±32.8 




PN100 


95 ± 8.1 


98 ±3.2 


64 ± 30.0 


37 ± 30.6 


90 ± 14.2 


45 ±31.5 


97 ± 2.7 


24 ± 38.2 


23 ± 25.8 


58 ± 16.9 


PF 


PN200 


95 ± 8.3 


98 ± 3.0 


65 ± 30.3 


39 ±31.8 


90 ± 14.7 


44 ±31.7 


98 ±2.5 


24 ± 38.4 


23 ±25.9 


58 ± 16.9 




PN500 


95 ± 7.9 


98 ± 2.9 


64 ± 33.6 


39 ±32.7 


90 ± 15.4 


44 ±33.1 


98 ± 2.5 


24 ± 38.4 


22 ± 25.8 


58 ± 17.0 


R-D25-Rand 


PN100 
PN200 


69 ± 21.4 
80 ± 15.8 


66 ±20.1 
78 ± 15.7 


89 ± 11.0 
95 ± 7.5 


64 ± 17.2 
62 ± 20.7 


63 ±20.1 

64 ± 20.2 


77 ±8.9 
80 ±8.5 


89 ±8.3 
87 ± 10.1 


54 ±21.6 
63 ± 16.0 


31 ±25.8 
28 ± 25.6 


33 ±29.1 
29 ±31.1 


R-D50-Rand 


PN100 
PN200 


73 ±21.5 
69 ± 23.0 


78 ± 16.7 
82 ± 17.9 


95 ±8.1 
95 ± 10.7 


64 ± 24.1 
81 ± 22.3 


61 ±21.0 
64 ± 19.0 


72 ± 10.1 
81 ± 9.1 


86 ± 12.5 
83 ± 13.4 


65 ±16.1 
64 ± 15.1 


28 ±25.3 
31 ± 25.2 


25 ±33.1 
25 ±32.9 


R-DlOO-Rand 


PN100 
PN200 


70 ± 24.7 
72 ± 22.3 


71 ± 21.5 
77 ± 17.6 


94 ± 11.2 
96 ± 8.8 


85 ± 24.6 

78 ± 23.1 


64 ± 19.3 
59 ± 20.6 


72 ± 12.5 
81 ± 8.7 


93 ± 5.1 
91 ± 6.9 


61 ± 16.1 
68 ± 13.6 


28 ± 27.2 
32 ± 25.7 


26 ± 32.0 
24 ±33.2 


R-D25-Hash 


PN100 
PN200 


73 ±21.3 
77 ± 18.3 


76 ± 12.0 
81 ± 14.6 


90 ± 12.1 
89 ± 14.8 


65 ± 24.4 
59 ± 23.9 


64 ± 19.9 
63 ± 20.2 


83 ± 6.4 
96 ± 2.7 


77 ±20.3 
70 ± 23.5 


67 ± 15.0 
55 ± 19.6 


38 ±25.0 
35 ± 25.8 


32 ± 29.6 

33 ± 29.8 


R-D50-Hash 


PN100 
PN200 


73 ±21.8 
75 ± 21.7 


79 ± 16.5 
83 ± 14.2 


98 ±3.2 

99 ± 1.2 


75 ±24.1 
74 ± 22.5 


66 ±21.3 
68 ± 21.6 


73 ± 11.8 
79 ± 10.5 


100 ±0.1 
100 ± 0.1 


64 ± 16.0 
63 ± 15.7 


34 ±25.3 
39 ± 24.4 


22 ±32.3 
21 ±33.5 


R-DlOO-Hash 


PN100 
PN200 


82 ± 15.0 
90 ± 8.3 


88 ± 10.1 
92 ± 8.5 


95 ±9.1 
95 ± 9.3 


80 ±32.9 
80 ± 33.6 


56 ±22.0 
52 ± 23.8 


91 ±4.6 

92 ± 5.3 


100 ±0.1 
100 ± 0.1 


64 ± 14.5 
67 ± 13.3 


30 ±25.9 
30 ± 26.3 


27 ±31.6 

28 ±31.8 


RB-D25-Rand 


PN100 
PN200 












76 ± 7.0 
92 ± 3.4 


86 ±9.5 
84 ± 12.1 


80 ±8.8 
78 ± 10.2 


68 ± 18.2 

62 ± 17.4 


49 ± 23.2 
59 ± 19.2 


RB-D50-Rand 


PN100 


_ 1 


_ i 


_ i 


_ 


_ 


86 ± 5.8 


98 ± 2.0 


73 ± 11.8 


58 ± 18.4 


44 ± 26.5 


PN200 












93 ± 3.6 


97 ± 2.7 


77 ± 10.8 


58 ± 18.3 


62 ± 17.8 


RB-DlOO-Rand 


PN100 
PN200 












96 ± 4.2 
95 ±5.0 


100 ±0.6 
100 ±0.1 


74 ± 11.6 
72 ± 11.8 


46 ± 24.0 
51 ±21.7 


54 ± 20.8 
53 ±22.0 


RB-D25-Hash 


PN100 
PN200 












89 ±2.9 
89 ± 3.6 


94 ±6.1 
89 ±8.9 


79 ± 10.3 
77 ± 10.5 


64 ±20.7 
61 ± 16.1 


71 ± 14.4 
77 ± 12.0 


RB-D50-Hash 


PN100 
PN200 












75 ± 12.0 
98 ± 1.9 


98 ± 1.7 
98 ± 1.7 


82 ±9.0 
82 ±8.7 


42 ±25.3 
59 ± 19.5 


52 ±22.7 
71 ± 14.0 


RB-DlOO-Hash 


PN100 
PN200 












97 ± 1.9 
99 ± 1.4 


99 ± 1.3 
98 ± 1.7 


82 ±8.9 
82 ±9.8 


51 ±20.8 
53 ± 22.2 


67 ± 14.7 
71 ± 13.1 


LIT 


99 ± 2.2 


92 ± 8.8 




77 ± 37.4 






100 ±0.0 




34 ± 26.4 





As illustrated in Table III, all the tracking approaches 
achieve similar performances on the sequence with simple 
background and stable illumination (dp and cubicle). For the 
video sequence fish, traditional methods show higher capacity 
for accommodating extreme illumination variation. On the 
other hand, for the outdoor scene and complex background 
tasks, i.e., the other 7 sequences, CS-based trackers consis- 
tently outperform PF tracker and KMS tracker. All the best 
performances are observed with RTCST and RTCST-B for 
these video sequences. Considering that the target could be 
viewed as missed when the TSP is below 30%, the traditional 
trackers are failure for the majority of these video datasets, 
i.e., KMS tracker for car4, pets2001_cl, pets2002 _pl and 
pets2004-2 _pl\ PF tracker for pets2002 j?l and pets2004 _pl. 
Moreover, £i tracker also fails on pets2004 jpl and pets2004- 
2 jpl due to the unstable target appearances. Our methods, on 
the contrary, do much better than the competitors and handle 
some intractable sequences (e.g., pets2004 and pets2004- 
2 _pl) very smoothly (with the TSP > 65%). Particularly, 
for the camera-fixed scenes, RTCST-B is applied and always 
achieves the highest accuracy. The superiority of RTCST-B 
over all the other trackers confirms our assumption that higher 
accuracy would be achieved when the tracking is considered 
as binary classification problem. 

Besides the TSP values, video frames with the tracked 
regions are listed in Figure 8 while tracking errors changing 
along with the frame index are also plotted in Figure 9. 

In Figure 8, only the best (with the highest average TSP 
value) result is employed to be shown for each tracker. The 
explicit tracking results support the statistics in Table III. 
RTCST beats KMS tracker and PF tracker on cubicle, car4, 
pets2000_cl and pets2002 jpl and obtain the similar perfor- 
mance as its competitors on dp. Being facilitated with CSBM, 
RTCS-B always achieves the highest accuracy if it is present. 
Quite the contrary, the traditional trackers fail in some complex 



scenarios, e.g. PF tracker on car4 and pets2002 jpl; KMS 
tracker on car4 and pets2002 jpl. 

From the error curves shown in Figure 9, we can find that 
our methods beat other visual tracking algorithms on most 
video sequences except dp and fish. Given that all the trackers 
perform similarly for dp and video fish is generated with 
extreme illumination variation which is added deliberately, 
RTCST and RTCST-B could be considered better than their 
competitors in terms of accuracy. 

To evaluate the new measurement, the TSP curves for 
cubicle and pets2002 j)l are also available in the Figure 9(k) 
and Figure 9(1). We can see that the TSP value and tracking 
error change oppositely, which is as expected. However, based 
on TSP, we can verify the capacity of single tracker without 
any "reference tracker". This is hard to achieve based on 
tracking error. 

D. Tracking Efficiency 

Efficiency plays a fatal role in real-time visual tracking 
applications. We record the elapsed time of each tracker in 
our experiment. The time consumptions (in ms) for processing 
one frame by the tracking algorithms are reported in Table IV. 
In the table, huge differences in tracking speed are observed. 
KMS tracker illustrates the highest efficiency with the lowest 
running speed of 83 ms per frame (83 mspf). On the contrary, 
t\ tracker (both for C-based version and Matlab-based version) 
is consistently slower than 14000 mspf due to the high 
computational complexity. Being equipped with OMP and 
dimension reduction manners, RTCST and RTCST-B are able 
to accelerate the original CS-based tracker by 117.3 (dp) to 
6271.2 (pets2004 j?l) times. The speed range for RTCST is 
54 - 968 mspf while that for RTCST-B is 85 - 534 mspf. 
PF tracker shows unstable efficiency among all the tests. Its 
running speed varies from 37 to 1727 mspf for the experiment 
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with 500 particles. Supposed that the speed threshold for real- 
time application is 100 mspf, most of the traditional methods 
and a part of our methods are qualified. £i tracker could not 
be viewed as "real-time" from any perspective. 

Moreover, since RTCST and RTCST-B are implemented 
in Matlab with single core, their running speeds could be 
increased remarkably by employing C/C++ language and 
multiple cores. Actually, the speed of Matlab-based i\ is 
already raised by 3.7 (pets2004 _pl) to 8.4 (cubicle) times in 
its C/C++ counterpart even though only one core is used. If 
we conservatively predict 10-time speed growth , both RTCST 
and RTCST-B will be qualified for real-time application in all 
the circumstances. 



E. Tracking Robustness 

As mentioned before, no trick is played to select the 
initial target region. The first region R should always be the 
minimum bounding box covers the whole target. Nonetheless, 
the bounding box could merely obtained manually, and hence, 
approximately. In practice, the selection error is unavoidable. 
If the visual tracker is not robust enough, minor selection error 
would lead to massive deviation with respect to tracking per- 
formance. We design a new experiment to test the robustness 
of tracking algorithms. In every repetition of the experiment, 
a fluctuation vector 8 — [Si,5 r ,5 3 ], is generated randomly as 

Si ~A/"(0, u), 5 t ~AT(0, u), 5 S ~N(0, 

where uo is a preset standard deviation with small value. The 
original bounding box R= [/, r, t, b] is then imposed by S to 
obtain a fluctuated rectangle region R* as 

R* = [Z*,r*,t*,6*] 
where l* 9 r*, t* and 6* are the new coordinates which are 
defined as 

r = i + s h t* = t + s u 

r* = {l + 8 a )-{r-l) + l + 5 u 
b* = (l + 5 a )-(b-t)+t + 5t. 
The tracking is then conduct based on R* . This procedure is 
repeated for 100 times for each tracker. Afterwards, the mean 
T and standard deviation T st d of TSP values are calculated 
for each frame. Finally, we plot the TSP band, which is a 
band changing along with frame index and covers the range 
[T - T s td, T + T 8t d], for every visual tracker. 

The new experiment is carried out on video sequence 
pets2000_cl and the TSP bands are demonstrated in Figure 7. 
An ideal TSP band should be with small variance and centered 
around a relatively high mean. We can see that in Figure 7, 
RTCST and KMS tracker show similar variance but RTCST 
has a higher TSP mean. PF tracker illustrates smaller variance 
but suffers from very low accuracy. RTCST-B comes with 
the highest average TSP value while still achieves smallest 
standard deviation. The experiment result exhibits the unstable 
nature of KMS tracker with respect to original target position. 
Meanwhile, it also confirms our conjecture about the presence 
of high robustness when background information is taken into 
consideration. 



pets2000_c1 




1 31 61 91 121 151 181 191 



Fig. 7: Robustness Verification for visual trackers. The semi-transparent patches stand 
for the TSP bands of trackers. Note that here RTCST and RTCST-B are performed with 
D-100 features which is generated via random projection and 200 particles; PF tracker 
uses 500 particles. 



VI. Conclusion and Future Directions 

In this paper, two enhanced CS -based visual tracking alg- 
orithms, namely, RTCST and RTCST-B are proposed. A cus- 
tomized OMP algorithm is designed to facilitate the proposed 
tracking algorithms. Hash kernel and random projection are 
employed to reduce the feature dimension of tracking applica- 
tion. In RTCST-B, a CS-based background model , which is 
termed CSBM, is utilized instead of noise templates. The new 
trackers achieves significantly higher efficiency compared with 
their prototype — the t\ tracker. The remarkable speed growth, 
which is up to 6271 times, makes CS-based visual trackers 
qualified for real-time applications. Meanwhile, our methods 
also obtain higher accuracy than off-the-shelf tracking algori- 
thms, i.e., PF tracker and KMS tracker. Particularly, RTCST-B 
achieves consistently highest accuracy and robustness thanks 
to the exploitation of background information. In short words, 
the proposed RTCST and RTCST-B are sufficiently fast for 
real-time visual tracking and more accurate and robust than 
conventional trackers. 

For future topics, we believe that one low-hanging fruit 
is employing the trick mentioned in [22] by Tropp et al. to 
accelerate the OMP procedure furthermore. Another promis- 
ing direction is to take color information into consideration 
because in many scenarios, color-based classification is more 
discriminant than the intensity-based one. The third direction 
of future research is treating different part of the target, e.g. 
left- top quarter and middle-bottom quarter, as different classes. 
As a result, a multiple classification is conduct within CS 
framework. The obtained likelihood for each particle then 
becomes a vector comprised of the confidences associated with 
various target parts. Because the time consumptions for binary 
and multiple classification are the same when using CS-based 
manner, we actually obtain more information at the same cost. 
If we can find a reasonable way to exploit the extra information 
for tracking, more accurate and robust result is likely to be 
obtained. 
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TABLE IV: Running time of visual trackers for one frame (ms). Note that every time consumption based on Matlab implementation is labeled by signal The notations of 
algorithm names are the same to those used in Table III. 





cubicle 


dp 


car4 


car 11 


fish 


pets2000_cl 


pets2001_cl 


pets2002_pl 


pets2004_pl 


pets2004-2_pl 


KMS 


22 ± 


17 ± 


60 ± 


15 ± 


36 ± 


83 ± 


40 ± 


22 ± 


21 ± 


31 ± 




rJN 1UU 


18 ± 


17 i 


173 ± 


22 ± 


39 ± 


199 ± 


35 ± 


28 ± 


53 ± 


381 ± 


PF 


PN200 


27 ± 


20 ± 


321 ± 


32 ± 


56 ± 


279 ± 


37 ± 


44 ± 


82 ± 


734 ± 




PN500 




















1727 ± 


R-D25-Rand 


PN100 


84 ± 3* 


100 ± 4* 


115 ± 2* 


103 ± 4* 


114 ± 4* 


105 ± 3* 


103 ± 4* 


131 ± 3* 


109 ± 2* 


117 ± 3* 


rJNzUU 


148 ± 9* 




193 ± 5* 


186 ± 11* 


198 ± 15* 


186 ± 9* 


177 ± 11* 


223 ± 8* 


168 ± 7* 


198 ± 5* 


R-D50-Rand 


PN100 


155 ± 4* 


171 ± 4* 


189 ± 4* 


197 ± 5* 


168 ± 10* 


192 ± 7* 


187 ± 5* 


188 ± 5* 


169 ± 5* 


167 ± 4* 


PN200 






337 ± 13* 


oOo ± ZU 


OOO ± OO 


oo4 ± Zo 






OOC _1_ 1 Q* 

Zoo ± lo 


OQC _|_ 1 O* 

OOO ± lo 


R-DlOO-Rand 


PN100 


477 ± 21* 


474 ± 17* 


480 ± 23* 


535 ± 10* 


435 ± 32* 


473 ± 11* 


496 ± 26* 


478 ± 18* 


439 ±17* 


481 ± 13* 


rJNzUU 


ozo ± y / 


/ 4Z ± 1UI 


yoa ± Zl 


968 ± 71* 


o7n x mi* 
8 / (J ± IU1 


/yo ± so 


oDo ± y4 


OftO 1 A r>* 

SOo ± 4Z 


681 ± 58* 




R-D25-Hash 


PN100 


91 ± 3* 


92 ± 4* 


109 ± 3* 


109 ± 3* 


103 ± 5* 


110 ± 4* 


108 ± 4* 


131 ± 3* 


102 ± 3* 


108 ± 2* 


PN200 






















R-D50-Hash 


PN100 


57 ± 1* 


54 ± 1* 


67 ± 1* 


62 ± 2* 


56 ± 1* 


70 ± 1* 


65 ± 1* 


70 ± 1* 


59 ± 2* 


59 ± 1* 


PN200 


96 ± 2* 


96 ± 2* 


118 ± 1* 


114 ± 2* 


101 ± 2* 


118 ± 2* 


114 ± 3* 


116 ± 2* 


109 ± 3* 


108 ± 1* 


R-DlOO-Hash 


PN100 


73 ± 1* 


85 ± 2* 


87 ± 2* 


86 ± 1* 


84 ± 3* 


96 ± 3* 


83 ± 2* 


96 ± 5* 


78 ± 2* 


82 ± 1* 


PN200 


138 ± 4* 


154 ± 4* 


148 ± 3* 


156 ± 2* 


157 ± 2* 


162 ± 4* 




169 ± 3* 


134 ± 2* 


159 ± 1* 


RB-D25-Rand 


PN100 












167 ± 5* 


175 ± 6* 


204 ± 6* 


142 ± 23* 


184 ± 4* 


PN200 












305 ± 11* 


330 ± 19* 


316 ± 26* 


237 ± 32* 


331 ± 18* 


RB-D50-Rand 


PN100 












187 ± 7* 


228 ± 4* 


222 ± 7* 


157 ± 33* 


211 ± 4* 


PN200 












389 ± 27* 


500 ± 29* 


397 ± 28* 


295 ± 74* 


427 ± 21* 


RB-DlOO-Rand 


PN100 












215 ±4* 


246 ± 3* 


248 ± 7* 


148 ± 36* 


253 ± 8* 


PN200 












456 ± 17* 


534 ± 23* 


438 ± 38* 


318 ±75* 


461 ± 45* 


RB-D25-Hash 


PN100 












162 ± 7* 


177 ±8* 


180 ± 11* 


131 ±27* 


178 ± 8* 


PN200 












274 ± 18* 


377 ± 28* 


306 ± 29* 


227 ±41* 


351 ± 15* 


RB-D50-Hash 


PN100 












95 ±2* 


88 ±3* 


106 ± 2* 


85 ±2* 


94 ± 1* 


PN200 












174 ± 5* 


166 ±2* 


176 ± 3* 


154 ±8* 


174 ± 3* 


RB-DlOO-Hash 


PN100 












121 ±2* 


106 ± 2* 


127 ±3* 


111 ±3* 


114 ±2* 


PN200 












220 ± 6* 


211 ±7* 


229 ± 5* 


207 ± 8* 


217 ±5* 


LIT-Matlab 


2.7e5± 1255* 


8.7e4 ± 1660* 




1.8e5±2402* 






1.6e5± 1944* 




3.7e5 ± 1857* 




L1T-C++ 


3.2e4±506 


1.4e4±320 




3.8e4± 1417 






3.4e4 ± 484 




1.02e5±607 
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Fig. 8: Tracking results shown as rectangles for 6 video sequences, namely, cubicle, dp, car4, pets2000_cl, pets2002 _pl and pets2004 _pl. Symbol #x stands for the xth frame. 
The initial target position is shown in light blue while the red, green, dark blue and yellow rectangle denote the tracked area by KMS tracker, PF tracker (PN500), RTCST 
(D100-Rand-PN200) and RTCST-B (D100-Rand-PN200) respectively. For a certain tracker, the illustrated result is the one with the highest TSP value among all the associated 
results. PF tracker extends the tracking region to the whole scene in the latter frames on pets2004 _pl, this is why we can not see the green rectangle in these frames. RTCST and 
RTCST-B tracking the similar regions for the last frame on pets2002 _pl and the yellow rectangle covers the blue one. 
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Fig. 9: The tracking errors and TSP values changing along with the frame index. All the visual trackers employ the optimal parameters, i.e., 500 particles for PF traker; 200 
particles and Dimension- 100 for both RTCST and RTCST-B. 



